Data Quality Models for High Volume Transaction Streams: A Case Study
نویسندگان
چکیده
An important problem in data mining is detecting significant and actionable changes in large, complex data sets. Although there are a variety of change detection algorithms that have been developed, in practice it can be a problem to scale these algorithms to large data sets due to the heterogeneity of the data. In this paper, we describe a case study involving payment card data in which we built and monitored a separate change detection model for each cell in a multi-dimensional data cube. We describe a system that has been in operation for the past two years that builds and monitors over 15,000 separate baseline models and the process that is used for generating and investigating alerts using these baselines
منابع مشابه
Monitoring Data Quality for Very High Volume
In this paper we describe a new methodology for detecting data quality problems in high volume transaction streams called change detection using cubes of models or CDCM. We also describe how this system is deployed at Visa and two case studies that occurred during its first year of operation. ∗This work was supported in part by the Visa International Data Interoperability Program and the U.S. A...
متن کاملA Framework for Discovery and Diagnosis of Behavioral Transitions in Event-streams
Date stream mining techniques can be used in tracking user behaviors as they attempt to achieve their goals. Quality metrics over stream-mined models identify potential changes in user goal attainment. When the quality of some data mined models varies significantly from nearby models—as defined by quality metrics—then the user’s behavior is automatically flagged as a potentially significant beh...
متن کاملHydrological Drought Forecasting Using Stochastic Models (Case Study: Karkheh watershed Basin)
Hydrological drought refers to a persistently low discharge and volume of water in streams and reservoirs, lasting months or years. Hydrological drought is a natural phenomenon, but it may be exacerbated by human activities. Hydrological droughts are usually related to meteorological droughts, and their recurrence interval varies accordingly. This study pursues to identify a stochastic model (o...
متن کاملPrediction of the Type and Amount of Surface Water Pollutants using Time Series Models (ARIMA) and L-THIA Model (Case Study: Namrood Sub-Basin, Hablehrood Watershed)
Due to the important role of non-point source pollution in water resources management, in this study time series modeling was applied to forecast water quality parameters and L-THIA model (one type of non-point source pollution models) was applied to estimate water pollutants. The purpose of this study was to compare results of L-THIA model and ARIMA models in Namrood sub-basin located in ...
متن کاملSimulation of groundwater quality parameters using ANN and ANN+PSO models (Case study: Ramhormoz Plain)
One of the main aims of water resource planners and managers is to estimate and predict the parameters of groundwater quality so that they can make managerial decisions. In this regard, there have many models developed, proposing better management in order to maintain water quality. Most of these models require input parameters that are either hardly available or time-consuming and expensive to...
متن کامل